An Ensemble Information Extraction Approach to the BioCreative CHEMDNER Task
نویسندگان
چکیده
We report on the Penn State team’s experience in the CHEMDNER chemical entity mention and the chemical document indexing tasks. Our approach devises a probabilistic framework that incorporates an ensemble of multiple information extractors to obtain high accuracy. The probabilistic framework can be configured to optimize for either precision, recall, or F-Measure based on the task requirement. The ensemble of extractors includes off the shelf chemical entity extractors, along with a version of ChemXSeer extractor that was trained and modified specifically for this task. Experiments on the training and development datasets obtain levels of recall as high as 89%, and f-measure of 73%, when optimizing for each measure respectively.
منابع مشابه
Extraction of Chemical and Drug Named Entities by Ensemble Learning Using Chemical NER Tools Based on Different Extraction Guidelines
Chemical named-entity recognition (chemical NER) is the task of extracting chemical information and chemical-related entities such as drug names and source materials from text in several domains such as bioinformatics and nanoinformatics. There have been several attempts to construct corpora for handling such chemical-related information based on different corpus-construction guidelines. Even t...
متن کاملMining Patents with tmChem, GNormPlus and an Ensemble of Open Systems
The significant amount of medicinal chemistry information contained in patents make them an attractive target for text mining. The CHEMDNER task at BioCreative V focused on information extraction from patents. This manuscript describes our submissions to the CEMP (chemical named entity recognition) and GPRO (gene and related object identification) subtasks. Our CEMP submission is an ensemble of...
متن کاملImprovement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination
Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...
متن کاملChemical entity recognition in patents by combining dictionary-based and statistical approaches
We describe the development of a chemical entity recognition system and its application in the CHEMDNER-patent track of BioCreative 2015. This community challenge includes a Chemical Entity Mention in Patents (CEMP) recognition task and a Chemical Passage Detection (CPD) classification task. We addressed both tasks by an ensemble system that combines a dictionary-based approach with a statistic...
متن کاملIncorporating domain knowledge in chemical and biomedical named entity recognition with word representations
BACKGROUND Chemical and biomedical Named Entity Recognition (NER) is an essential prerequisite task before effective text mining can begin for biochemical-text data. Exploiting unlabeled text data to leverage system performance has been an active and challenging research topic in text mining due to the recent growth in the amount of biomedical literature. We present a semi-supervised learning m...
متن کامل